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I.  INTRODUCTION 


Built  in  tests  and  built  in  test  equipment,  abbreviated  BIT  in 
this  paper,  are  the  names  for  hardware  or  software  whose  sole  pur¬ 
pose  in  a  system  is  to  monitor  its  "health".  By  health  we  mean  its 
operational  status,  readiness  and  availabiltiy .  Supervising  the  system 
states  a  BIT  should  be  also  able  to  detect  and  isolate  subsystem  and 
component  failures  and  so  speed  up  repairs  by  minimally  trained 
personnel. 

Since  costs  of  digital  equipment  are  decreasing  and  subsystems 
are  becoming  more  and  more  complicated,  the  need  for  BIT'S  will 
increase.  This  paper  is  an  attempt  towards  studying  the  properties 
of  BIT'S  in  the  context  of  reliability  theory. 

In  particular,  we  examine  the  effect  of  BIT'S  on  overall  system 
availability.  A  perfect  BIT  will  improve  availability  by  reducing  re¬ 
pair  time.  However,  a  BIT  which  itself  fails  frequently  and  which 
has  a  long  repair  time  may  have  a  net  negative  effect  on  the  avail¬ 
abiltiy  of  the  total  system.  Graphs  are  presented  to  quantitalively 
relate  BIT  failure  and  repair  rates  to  system  uptime  and  availability. 


II.  DISCUSSION  OF  BIT  USAGE 


Consider  a  repairable  system.  A  BIT  should  not  under  any  cir¬ 
cumstances,  cause  the  system  to  fail.  But  a  BIT  is  just  another 
system,  so  it  might  also  fail  to  perform  its  function  properly.  In 
such  a  case  several  possibilities  might  arise: 
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Case  1: 

A  BIT  "failure"  might  cause  the  system  to  stop  its 

operation  and  to  be  sent  to  maintenance.  For  example: 

although  a  rocket  navigational  system  shows  apparently 

correct  outputs,  the  supervising  officer  will  not  de¬ 
clare  the  system  state  as  the  state  "Go"  if  a  BIT 

indicates  that  something  is  wrong.  Instead,  a  repair 

action  will  be  requested. 

Case  2: 

A  Bit  "failure"  does  not  suspend  the  system  operation, 

but  the  BIT  is  immediately  repaired  or  replaced.  This 

case  is  possible  only  when  the  BIT  and  the  system  are 

physically  separated,  as  in  acoustic  monitoring  of 

engines  or  turbines. 

Case  3: 

A  BIT  "failure"  does  not  cause  the  system  functioning 

to  stop,  but  its  repair  should  wait  for  regular  main¬ 
tenance  or  for  the  system  to  breakdown.  This  is  the 

most  commonly  encountered  situation. 

Case  4: 

A  BIT  "failure"  is  ignored.  For  example:  most 

people  don't  care  about  the  TV  channel  number  indi¬ 
cation,  as  long  as  they  can  see  their  favorite  shows. 

In  the  following  discussion,  we  will  be  dealing  with  systems 
which  consist  of  modules,  subsystems  or  subassemblies.  Whenever 
the  system  fails,  the  faulty  unit  is  replaced  by  one  in  working  con¬ 
dition.  We  will  call  such  a  unit  a  block. 

We  first  discuss  systems  when  each  subsystem  is  monitored  by 
exactly  one  BIT.  Then  we  present  systems  which  can  be  reduced  to 
the  above  situation. 
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system 


CASE  1 


CASE  2 


CASE  3 


CASE  4 


BIT- 


a-? - 

t . " 

repair 

-*•  time 

i  i _ 
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I _ 

repair,  system  in  operation 

-1 _ , 

j 

<H - - 

repair  of  system  and  BIT 


repair  of  system 

Figure.  1:  Different  scenarios:  cases  of  BIT  failures  in  repairable  systems. 


Common  cause  failures  such  as  breakdowns  due  to  heat,  vibra¬ 
tions,  and  radiation  will  cause  the  whole  system  to  fail.  Although 
these  failures  are  unavoidable,  we  will  not  consider  them  here. 
Rather  we  will  concentrate  on  system's  failure  which  is  caused  by 
individual  blocks  failure.  In  this  situation  the  assumption  of  inde¬ 
pendence  between  statistical  properties  of  the  system  blocks  can  be 
justified.  Mature  designs  and  good  protective  measures  against  en¬ 
vironment  overstresses  can  keep  the  commoncause  failures  to  a  mini¬ 
mum. 

The  cases  1,2,3  and  4,  described  before,  are  treated  under  each 
of  two  different  assumptions: 

Assumption  I:  Failure-repair  processes  in  different  blocks  are 

statistically  independent.  Blocks  are  separately 
maintained.  Unrealistic  in  this  assumption  is  that 
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blocks  or  subsystems  are  still  in  operation  or  at 
least  aging,  even  when  the  system  is  down. 
Assumption  II:  Blocks  have  a  series-type  reliability  relation. 

When  a  block  fails,  the  system  fails  and  the  other 
blocks  don't  function,  so  that  these  blocks  cannot 
fail  and  do  not  age.  We  will  refer  to  the  situa¬ 
tion  as  the  state  of  a  "suspended  animation". 
(Barlow,  1982) 

Although  assumption  I  requires  that  there  should  be  at  least  as 
many  repair  facilities  as  there  are  blocks  in  the  system,  this  is  not 
an  essential  constraint.  We  just  don't  want  to  introduce  queuing 
problems  into  the  consideration. 

Real  systems  properties  are  somewhere  between  these  two  ex¬ 
treme  assumptions ,  so  their  properties  can  be  assessed  by  inter¬ 
preting  the  results  from  these  two  cases. 


MODULARITY  -  units  "blocks" 


each  block 
1  subsystem 
1  BI7 
each  block 
can  be  reduced 
to  above 

general  structure 
in  the  block 


CASE  1  BIT 
CASE  2  BIT  separately  maint. 


CASE  3  BIT  failure  wait 


CASE  4  BIT  failure  -  ignore 

ASSUMPTION  I 

ASSUMPTION  II 

blocks  stat. 

blocks  in  series 

independent 

"suspended 

animation" 

Fiaure  2: 


Summary  of  assumptions.  Chapters  are  divided  according 
to  structure  in  the  blocks.  Each  discussion  is  then  subdivided 
into  4  cases  each  under  2  assumptions. 
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In  the  next  chapter  we  first  review  the  motions  of  structures 
and  coherency  and  then  we  proceed  to  the  availability  consideration  of 
BIT  equipped  systems. 

III.  STRUCTURE  FUNCTIONS 

In  this  chapter  the  motions  of  indicator  and  structure  functions 
are  reviewed. 

Let  the  indicator  variable  Xj(t)  be  defined: 

Y  _  <  1  j'th  block  is  in  the  operational  state  at  time  t 

rL'  ‘  0  j'th  block  is  "failed"  at  t  (3  1) 

Since  the  systems  we  are  interested  in  contain  components  for 
performing  the  intended  function  and  components  for  monitoring  the 
state  of  "health",  we  introduce  two  more  indicator  functions.  First 
X  .(t)  refers  to  the  functional  component  j,  which  we  will  call  the  jth 
subsystem.  Second  Xgj(t)  refers  to  the  BIT  indication  of  the 
"health"  state  of  jth  subsystem. 

X  ft')  =  {  1  subsystem  is  in  the  operational  state  at  time  t 

sjv  J  1  0  jth  subsystem  is  failed  at  t 

(3.2) 

x  /t\  _  f  1  jth  BIT  declares  jth  subsystem  as  "OK"  at  time  t 

Bjk  '  i  0  BIT  indications  are  "not  OK"  at  t 

We  refer  for  the  "operational"  state  as  "OK"  state,  rather  than 
"not  failed",  because  a  unit  does  not  necessarily  have  to  fail  or  break 
down  for  the  system  to  stop  working.  Unfortunately,  it  is  in  the 
nature  of  a  BIT  to  occasionally  show  "wrong"  status  of  the  sub¬ 
system.  Most  BIT'S  determine  the  controlled  system  state  by  the 
measurement  of  parameters  and  use  comparisons  to  some  predetermined 
values.  For  example,  a  properly  functioning  system  might  show 
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voltages,  temperatures,  noise  levels,  vibration  frequencies,...  out  of 
prescribed  tolerance  because  of: 

normal  system  variability 
environmental  variability 
noise 

inteference 
graceful  degradation 
transients 
tunning 

All  these  influences  can  cause  "outliers"  with  the  result  that 
"not  go"  or  "not  OK"  indications  appear. 


BIT  INDICATIONS 

SUBSYSTE^S^^  "not  OK  "OK" 

failed  "not  OK" 


not  failed  "OK" 


VALID 

MISSED  FAULT 

Type  1  error 

VALID 

Figure  3:  Definitions  of  Missed  Fault  and  type  1  error. 


Sometimes  such  "wrong"  indications  don't  last  long.  So  called 
"squawks"  or  intermittent  failures  will  for  example  only  bother  the 
pilot,  but  if  the  maintenance  personnel  encounter  them  it  is  their 
duty  to  look  up  what  is  wrong.  These  are  refered  to  as  RTOK- 
Retest  OK,  CND  -  cannot  Duplicate  on  BCS-Bench  checks  -  serviceable 
"failures". 

Since  the  "fail  safe"  design  principle  is  usually  applied  to  BIT 
implementations,  missed  faults  do  appear  but  are  not  in  the  magnitude 
of  FA's  and  we  will  not  elaborate  on  them.  We  do  assume  that  true 
subsystem  failures  are  self-evident.  The  BIT  may  give  early  warning 
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and  help  locate  the  fault  within  the  subsystem. 

But  BIT  can  also  physically  fail  or  there  might  be  some  bugs  in 
the  BIT  software.  will  then  indicate  such  situations. 

v  /tv  .  (1  j'th  BIT  physically  operational  at  time  t 

ABITp  '  *■  0  j'th  BIT  physically  failed  at  t  ^  4) 

We  assume  that  a  BIT  failure  always  produces  a  "not  OK"  indica¬ 
tion  for  the  monitored  block,  meaning  that  no  indication  of  failed  BIT 
result  in  a  maintenance  action. 

SUBSYSTEM  j  IS  FAILED 


Figure  4:  Block  declared  "not  OK".  X'(t)  complement  of  X(t; 

indicator .  Note  that  we  define  false  alarm  as  an  event  when  both 
BIT  and  subsystem  are  functioning,  not  taking  in  account  BIT 
physical  failures. 

We  will  call  a  false  alarm  only  the  situation  when  the  BIT  shows 
"not  OK"  status  but  both  the  BIT  and  the  corresponding  subsystem 
are  operational.  These  nomenclatures  are  not  standard  since  some 
authors  denote  by  the  false  alarm  all  the  wrong  BIT  indications  while 
.  we  separated  the  BIT  physical  failures  and  software  errors. 

As  the  indicator  variables  were  introduced  on  the  components 
level,  we  will  also  introduce  an  indicator  variable  which  will  charac¬ 
terize  the  state  of  the  whole  system.  Since  a  system  consists  of  its 
elements,  we  will  call  the  system's  indicator  variable  a  structure 
function  of  the  system  4> 
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*/rv4/+\i\  -  f  1  the  system  is  in 

*([Xj(t)])  -  {  0  0ther4se  at  t 


the  "OK"  state  at  time  t 


where  [Xj(t)]  =  (X1(t),X2(t),  . ...,Xn(t))  denotes  a  vector  of  the 
components  indicators. 

Most  Reliability  theory  results  relate  to  the  coherent  structures. 
The  coherent  structures  are  those  in  which  every  component  affects  or 
influences  the  system's  state  or  more  formally, 


<K[Xj(t)]  is  increasing  in  every  Xj(t)  and 
every  component  is  relevent. 


(3.6) 


Addition  of  a  BIT  to  a  system  should  not  affect  its  reliability,  as 
mentioned  in  the  beginning.  So  the  structure  of  a  BIT  monitored 
system  is  not  coherent.  Obviously,  we  cannot  afford  to  build  an 
airplane  which  will  crash  just  because  the  indication  went  wrong.  So 
from  the  reliabiltiy  viewpoint,  the  overall  system  including  its  primary 
function  and  BIT  has  a  noncoherent  structure.  But  when  the  system 
is  maintained,  the  BIT  plays  a  crucial  role.  Increased  complexity  will 
affect  the  systems  reliability,  but  the  faster  repairs  might  still  in¬ 
crease  the  availability  of  the  system. 

In  classical  systems  most  of  the  repair  time  is  usually  needed  to 
locate  a  failed  component.  The  BIT  is  here  just  to  reduce  this  time. 
So  the  system  is  truly  coherent  from  the  availabiltiy  and  system's 
readiness  viewpoint,  although  the  structure  is  "noncoherent"  with 
respect  to  reliability. 

At  this  point,  we  introduce  two  structure  functions  <!>,-,  and  . 
the  reliability  or  the  classical  structure  describes  the  coherent 
structure  or  the  relevant  organization  of  subsystems  to  perform  the 
intended  function .  On  the  other  hand ,  availability  structure , 
deals  with  time  aspects  of  the  system  like  readiness. 
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*R([Xj(t)])  = 


1  system  is  functioning  "OK"  at  time  t 
^  0  system  is  "not  OK"  at  t 


(3.7) 


1  system  is  declared  "OK"  at  t 
^([XKt)]*)  -  {  q  SyStem  not  declared  "OK"  at  t 

Where  [Xj (t) ]  =  (Xg^Ct^X^Ct), . . .  ,Xgn(t))  is  vector  of  the  sub¬ 
systems  indicators  and  [Xj(t)]*  =  (Xsl(t),Xs2(t),...,Xsn(t),XB1(t), 
XB2(t)'  •  •  •  /  XgmCt))  is  augmented  vector  of  the  BIT  monitored  system. 


To  discuss  the  relationship  between  the  two  structure  functions, 
we  first  discuss  the  situation  where  every  subsystem  is  monitored  by 
only  one  BIT,  and  separate  subsystems  have  separate  BIT'S. 


OPERATIONAL  USAGE 


MAINTENANCE  USAGE 


Figure  5:  Coherent -structure  <K[Xj(t)]).  During  th<c  svsten, 

-  svstem  n»ft\T  hpUl?  n£l  m/'uence  the  Performance  of  the 
sYstem  (left).  But  when  the  system  is  maintained  BIT 

readiness  °rten  repair  and  XhUS  increase  availability  ana 


III.l  SITUATION  WHERE  EACH  SUBSYSTEM 
IS  MONITORED  BY  ONE  BIT 


Since  every  subsystem  is  monitored  by  exactly  one  BIT,  we  can 
bring  the  two  together  and  call  the  new  entity  a  block.  Thus  the  jth 
block  consists  of  the  jth  subsystem  and  the  corresponding  BIT. 
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A  block  is  thus  just  an  augmented  subsystem. 

When  a  system  is  maintained,  a  block  will  always  be  declared 
"OK"  if  both  the  subsystem  and  its  BIT  are  functioning  properly. 
When  the  subsystem  is  "not  OK"  and  the  BIT  functions  correctly,  the 
block  is  not  ready,  it  is  "not  OK".  The  same  will  happen  when  the 
subsystem  is  "OK"  but  the  BIT  indication  is  wrong. 


SUBSYSTEM  BUILT-IN-TEST  BLOCK 
Sj  INDICATION 

Bj  j 


OK 

OK 

!_ _ 

OK 

OK 

not  OK 

not  OK 

not  OK 

OK 

not  OK 

not  OK 

not  OK 

not  OK  . 

Figure  6:  Availability  viewpoint  on  the  blocks. 

If  X.(t)  is  the  indicator  function  of  the  blocks  j,  X  .(t)  and 
J  SJ 

Xgj(t)  are  indicator  variables  of  the  j'th  subsystem  and  the  corres¬ 
ponding  BIT  declaration  then: 

Xj(t)  =  Xsj(t)  *  XBj(t)  (3.8) 

So  that  for  the  availability  purposes  the  subsystem  and  the  corres¬ 
ponding  BIT  are  connected  in  series: 

V[Xsj(t)'XBj(t)])  =  V[Xsj(t)*XBj(t)1)  (3.9) 

where  [Xg.(t) ,Xg.(t)]  =  (Xgl(t),XB1(t),Xs2(t),XB2(t), . .  .Xgn(t), 
XBn(t))  is  the  augmented  vector  of  indicators  of  the  subsystems  and 

BIT'S.  [Xgj(t)-XBJ(t)]  =  (Xsl(t)-XB1(t),Xs2(t)-XB2(t) . Xgn(t) 

*XBn(t)),  where  the  dot  denotes  the  product. 
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Figure  7i  Relation  between  <J>R  and  4^  in  situation  where  every  sun- 

system  Sj  is  monitored  oy  separate  tur/BITE  Bj.  Noncoherent 
structure  from  reliability  viewpoint  becomes  coherent  with  subsystem 
and  BIT/BITE  connected  in  series. 


Since  the  addition  of  components  always  increases  the  complexity,  we 
will  now  proceed  to  show  the  benefits  and  also  the  drawbacks  of 
equipping  systems  with  BIT'S.  To  appreciate  the  addition  of  a  BIT 
we  have  to  look  into  the  time  behavior  of  the  system's  operation  and 
repairs  with  some  detail.  We  will  limit  ourselves  to  the  patterns 
which  can  be  described  by  alternating  renewal  processes. 

IV.  SHORT  OVERVIEW  OF  CLASSICAL  RESULTS 

IV.  1.  ASSUMPTION  I:  PROCESSES  IN  STATISTICALLY 
INDEPENDENT  BLOCKS 

Let  Fj  be  the  distribution  function  of  the  failure  times  for  the 
jth  component  and  let  Gj  similarly  be  the  p.d.f.  for  the  repair  times. 
The  renewal  function  Aj(t)  of  the  embedded  renewal  process  of  fail¬ 
ures  on  (0,t)  is  by  definition  the  expected  number  of  failures  of  the 
j'th  component. 

A.(t)  =  I  F$k+1)  (  . 

]  k=0  (4.1) 

where  *  denotes  the  Stieltjes  convolution  and  (k)  denotes  k-fold  re- 
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cursive  convolution.  For  k=0  let  be  the  unit  step  function. 
Similarly,  the  embedded  process  of  repairs  can  be  described  by 

E.(t)  =  I  Fj(k)*Gj(k)(t)  (4.2) 

1  k=0 

where  Ej(t)  is  the  expected  number  of  repairs  on  (0,t). 

dA.(t) 

=  Hr-  <4-3> 

d=.(t) 

=  Hr-  <4-3> 

Aj(t)  and  4j(t)  are  corresponding  renewal  densities. 

When  there  is  no  BIT  present  we  can  introduce  the  point  avail¬ 
ability  Aj(t):  Aj(t)  =  (Xj(t)  =  1}.  It  can  also  be  expressed  as: 

Aj(t)  =  (1  -  Fj(t)  +  (1  -  Fj(t)*Ej(t))  (4.4) 

The  component  j  is  available  if  it  is  not  failed  or  if  it  is  repaired.  We 
call  Aj  the  time  limit  of  Aj(t),  if  it  exists 

Aj  =  lim  Aj(t)  =  -1U-  (4.5) 

J  t-x»  J  HJ  J 

where  the  well  established  result  includes  pj  and  Uj  the  means  of  Fj 
and  Gj  respectively. 

For  the  coherent  systems,  the  availability  is  the  expected  value 
of  the  system  to  be  OK: 

E[4»([X.(t)])  =  h([A.(t)])  (4.6) 

where  [X.(t)]  =  (Xx(t) . Xn(t)),  [Aj(t)]  =  (A!(t),A2(t) . An(t)) 

and  h  is  called  the  (system)  availability  function.  According  to 
Barlow  and  Prochan  (1975): 

lim  h  ( [ A-(t) ] )  =  h( [A- ] )  (4.7) 

t-*x>  J  J 


Note  also  that  if  the  system  is  not  repairable 
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[*(Xj(t»]  =  h([E(Xj(t))])  (4.8) 

Further:  the  probability  that  the  system  will  fail  in  the  interval 
(t,t+dt)  denoted  A(t)dt  is 

n 

A(t)dt  =  I  A.(t)I.(t)dt  +  CT(dt)  (4.9) 

i=i  1  ] 

where  we  assumed  that  the  probability  of  more  then  one  failure  in 
(t,t+dt)  is  of  order  dt-o(dt).  A(t)  and  Aj(t)  are  intensity  functions 
of  the  system  and  the  jth  component  failures.  Ij(t)  is  the  reliability 
importance  (Birnbaum,  1967)  of  the  jth  component  at  time  t  and  is 
defined : 

Ij(t)  =  hdjJAjtf)])  -  lKO^IAjft)])  (4.10) 

where  (1.  ,[A.(t)])  =  (Ax(t). ..  ^.^(tJ.l.A^Ct) . An(t))  and  (0j' 

[A-(t)])  is  similarly  defined  vector  with  zero  on  the  jth  spot.  Ij(t) 
is  the  probability  that  the  j'th  component  will  cause  the  system  to  fail 
at  time  t.  The  previous  equation  (4.9)  shows  that  the  intensities  of 
the  component  failures  should  be  weighted  with  their  "criticalities" 
when  we  assess  their  influence  on  the  system  failure  the  importance 
function  also  has  the  properties: 


ah[A.(t))]  9E(<K[X.(t)])) 

Tl)  "  9Aj(t)  9E(Xj(t)) 

Ij  =  (JffiljCt)  =  hdj.IAj])  -  h(0j,[Aj]) 


If  A(t)  passes  to  a  limit  as 


n 

lim  A(t)  =  I 
t=»  j=l 


u.+v.  j 
H]  J  J 


(4.11) 

(4.12) 


(4.12) 
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Note  that  the  renewal  function  A(t)  is  also  where  a(t)  is  negli¬ 
gible  and  (Baxter  1983) 


lim  =  limA(t)  (4.15) 

Since  Ij(t)  is  the  probability  that  the  failure  of  the  j'th  component 
will  cause  the  system's  failure  at  time  t,  it  is  also  the  probability  that 
the  repair  of  the  j'th  component  will  restore  the  system's  function. 
So  the  probability  that  the  system  will  be  repaired  in  (t,t+dt),  de¬ 
noted  |(t)dt,  is 

£(t)dt  =  I  |.(t)I.(t)dt  +  a(dt)  (4.16) 

j=i  J  J 

and  as  above: 


t  n  t 

H(t)  =  J  4(u)du  =  I  J  L(u)I.(u)du 

j=l  0  J 


0"'  i=l  "  o  "J '  '  J 

where  E(t)  is  the  expected  number  of  repairs  and  as  above: 


(4.17) 


H(t)  _  ? 

1  )=1  ^j+vj 


lim 


I.  =  lim 


A(t) 


t“*» 


(4.18) 


This  shows  that  the  expected  number  of  repairs  per  unit  time  is 
equal  to  expected  number  of  failures  per  unit  time,  after  a  long  time. 
Let  Ui,  U2  .  ..,Uk  denote  the  successive  uptimes,  then 


lim  E[U1+U2+...+Uk]  _  h(  [Ajl ) 

k-*»  k  “  ^  - —  I 

I  |J.+V. 


(4.19) 


and  similarly  for  D!,D2...,Dk 
E  [Dx+D2+.  .  .+D^ 

lim  - - k 

lr^<»  K 


be  the  successive  system  downtimes: 

] 


l-hdAil) 

1 


n 

I  |J.+V 

j=l 


1  vj 


!i 


(4.20) 


Thus,  the  long  run  average  system  uptime  and  downtime  are  easily 
calculated . 
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All  the  above  results  are  valid  under  the  assumption  I:  Failure- 
repair  processes  in  different  component  positions  are  assumed  to  be 
statistically  independent.  Next,  results  under  the  second  assumption 
are  reviewed. 


IV. 2  ASSUMPTION  II:  SUSPENDED  ANIMATION 

The  following  results  hold  for  the  series  system  in  which  the 
components  are  shut  down  until  the  failed  component  is  fixed. 

The  long  run  average  system  availability  Aav  is  defined  as 

Aav  =  ““  f  A<u>du  =  <4-21> 

*  1+1 

i=l  J 


If  the  limit  of  A(t)  exists,  then  it  is  equal  to  Aay.  The  limiting 
average  expected  number  of  the  system  failure  caused  by  the  jth  com¬ 
ponent. 


lim 

t-*» 


t 


lim 


A(t)  = 
t 


Aav 


H 

n  1 
* 


(4.22) 


and  long  run  average  of  the  system  uptimes  (downtimes)  is  similarly 
as  before 


lim 

k-*» 


E(U1+U2+. .  •+U]C) 
k 


lim  E(Di+D2+-.-+Dk) 
k^»  k 


1 

1_ 

Mj-N 


M 


n 

M  I 
j=l 


4 

MJ 


1-A. 


av 


A 


av 


(4.23) 


(4.24) 
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V.  AVAILABILITY  CONSIDERATIONS 
The  previous  results  will  now  be  applied  to  BIT-monitored 
systems  constructed  of  blocks  which  each  contains  one  subsystem 
and  its  BIT.  After  a  general  duscussion  of  failure  rates  in  such 
systems,  we  develop  approximate  system  availabilities  by  estimating 
the  mean  time  to  failure  and  the  mean  time  to  repair  for  a  BIT-mon- 
itored  block. 


V.l  FAILURE  RATES  OF  BIT  MONITORED  SYSTEMS 
Case  1:  BIT  indication  might  send  the  system  to  maintenance. 

In  this  case,  the  BIT  influences  the  system  status.  We've  in¬ 
troduced  two  measures  of  the  system  effectiveness  -  namely  the  re¬ 
liability  and  the  availabiltiy  so  we  defined  two  h-functions: 

hR([A.(t)])  =  E[*  R([X.(t)])] 

J  j  (5.1) 

hA([Aj(t)]*)  =  E[<J>A([X.(t)]*)] 

where  [A. (t)]  =  [E(Xgj(t))]  =  E([Xgj(t)])  and  [A.(t)]*  = 

[Asj(t),ABj(t>]  =  [E(Xgj(t)].  As  before,  we  discarded  BIT  com¬ 

ponents  which  are  in  hR(  ). 

hA([Aj*(t)])  =  E(*A[Xj(t)]*)  =  E(4.A([Xsj(t),XBj(t)]»  = 

E(4,R('Xsj(«-XBj(t>D 

or 

hA([Asj(t)'ABj(t)1)  =  hR(fAsj(t)‘ABj(t)])  <5-2> 

Since  «i>A  is  coherent  we  can  use  the  classical  result  for  A.A(t)dt,  de¬ 
noting  the  probability  that  the  system  will  fail  from  the  availability 
aspect: 

n 

=  dsjCtMgjCt)  +  IBj(t)AB].(t)dt)  +  o(dt)  (5.3) 


AA(t)dt 
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since  in  the  discussed  case  1  wrong  BIT  indications  are  also  treated 
as  "failures"  A  .(t)  is  the  failure  rate  of  the  j'th  subsystem,  AD.(t)  is 

sj  DJ 

the  rate  of  "not  OK"  indication  from  the  corresponding  BIT. 

Iei(t)  and  ID.(t)  are  the  reliability  importances  associated  with  the 
subsystem  and  its  BIT.  These  can  be  evaluated  as 
ah 

v>-  - 

3AR([Ai(t)])  ,Aft) 

3A]?t)  »Agj(t) 


A<[A.(t)]*)  8hR([Asj(t)-ABj(t)]) 

»V‘>  =  - — 


Isj(t)  =  Ij(t)  ABj(t)  (5.4) 

and  similarly 

IBj(t)  =  Ij(t)Agj(t)  (5.5) 

where  Ij(t)  is  the  reliability  importance  of  the  j'th  block  and  where 
Aj(t)  =  Asj(t)AB.(t)  thus, 
n 

AA(t)dt  =  Ij(t)[ABj(t)Asj(t)  +  Asj(t)ABj(t)]dt  +  o(dt)  (5.6) 

From  the  above  we  establish  two  inequalities  since  A  .(t),  AD.(t)  <  1 

sj  BJ 

it  follows  that 


n 

A,(t)dt<  I 
j=l 


Ij(l) [Agj(t) 


ABj(t)]dt  = 


n 

I 

j=l 


Ij(t)Asj(t)dt 


I  I.(t)AR.(t)dt 
j=l  ]  B] 

(5.7) 

AA(t)  <  As  (t)  +  AB(t) 

where  Ag(t)  is  the  intensity  of  failures  when  the  system  consists  of 
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blocks  containing  only  subsystems,  and  A.g(t)  is  similarly  the  inten¬ 
sity  of  failures  when  the  system  consits  of  blocks  containing  only  the 
corresponding  BIT. 

n 

*q(t)dt  =  I  I.(t)X  .(t)dt 

b  j=l  J  SJ 

n 

\R(t)dt  =  I  I.(t)XR.(t)dt 

B  j=l  ]  Bj 

the  rate  with  which  the  BIT  equipped  system  will  be  sent  to  its  main¬ 
tenance  is  bounded  above  with  the  sum  of  2  rates:  first  is  the  rate  of 
the  system  consisting  of  only  subsystems  and  no  BIT'S  and  second  is 
the  rate  of  the  system  where  subsystems  are  replaced  by  its  BIT'S. 


Also: 


n 


\A(t)dt  >  2  Ij(t)*sj(t)ABj(t)dt 
j=l 


(5.8) 


which  follows  from  above  (5.6)  since  all  terms  are  positive.  Since  each 
BIT  is  designed  to  have  high  availability,  the  above  inequality  states 
not  completely  unexpected  results:  the  BIT  equipped  systems  have 
higher  intensity  of  failures  than  the  equivalent  system  without  BIT 
if  BIT  can  influence  the  decision  about  the  system  status  (case  1). 

When  the  system  is  in  operation  long  enough  so  that  it  settles 
down  to  stationanity  or  steady  state,  we  obtain: 


n 


K  =  lim  \a(t)  =  II.  ( 


t*»  "  j=l  **  '  Vvj  ABi  +  MBj+Vj  Asj) 


(5.9) 


where  msj,  are  mean  times  between  failures  of  subsystem  j  and 
mean  time  between  the  BIT  "no  OK"  indication,  vj  is  mean  time  to 
repair  of  the  jth  block  (subsystem  and  BIT)  for  which  an  approximate 
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expression  will  be  derived  below.  Ij,  Agj,  Agj  are  limiting  values  of 

I.(t) , A  .(t) ,  AR.(t)  respectively  the  inequalities  5.7  and  5.8  becomes: 
J  SJ 

ABXS  <  ^A  <  AS  +  AB  (5.10) 


where  Ag  is  a  constant  such  that  for  all  BIT'S  Ag  <  Agj,  and  Ag, 
Xg  are  the  steady  state  values  of  Ag(t)  and  Ag(t)  respectively. 

The  formula  holds  for  the  assumption  I  of  independent  blocks. 

For  the  suspended  animation  (assumption  II)  we  get  similar  re¬ 
sults.  But  since  blocks  are  assumed  to  be  connected  in  series  relia¬ 
bility  importance  does  not  appear. 


(t)  = 


n 


1 


n 


n  v,  I 


n 


1+  Z  -i  j=l  Msj  1+  Z  j=l  ^Bj 
j=l  Msj  j=l  MBj 


(5.11) 


Case  2,  3  and  4  Bit  indications  ignored 


In  contrast  to  the  previous  case,  here  the  BIT  declarations  do 
not  influence  decisions  of  the  system  status,  as  long  as  the  system  is 
in  the  operation.  Thus  the  failure  rate  stays  the  same  as  without 
BIT'S 


n 

X(t)dt  =  Z  I.(t)A  .(t)dt  +  o(dt) 
j=l  J  S] 


(5.12) 


To  evaluate  availabilities  we  will  use  the  asymptotic  values,  be¬ 
cause  the  above  assumptions  guarantee  their  existance.  To  repeat: 
A  =  h([A.])  (5.21) 

Mi  ^  A . . .  availability  of  the  system 

Ai  =  u,+v.  =  — -  Aj. .  .availability  of  the  j'th  block 

Hj  j  l+r^  (jj. .  .meantime  between  failures  for 

HJ  the  jth  block. 

uj ...  meantime  between  repairs  for 
the  jth  block. 

The  above  is  valid  under  assumption  I  of  independent  blocks. 
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Under  assumption  II  of  the  suspended  animation  in  series: 

1 

Aav  ~  ,+  vj  (5.22) 

j=l  Mj 

V2.1  ESTIMATES  FOR  THE  MEANTIME  BETWEEN  FAILURES  pj 

Obviously  in  the  cases  2,  3  and  4  where  BIT  indications  are 
ignored  when  the  subsystem  is  in  the  operation,  the  mean  time,  be¬ 
tween  failures  |jj  is  just  equal  to  the  mean  time  between  failures  of 
the  jth  subsystem 

Case  2,  3,  4  Mi  =  Mc<  (5.23) 

But  in  the  case  1,  the  BIT  "not  OK"  indications  might  send  the 
system  to  the  maintenance  as  described  on  Figure  4,  page  5. 

Three  distinct  events  might  send  our  system  to  repair  by  putting 
the  j'th  block  to  "not  OK"  state:  either  the  j'th  subsystems  fails  (I) 
or  the  j'th  BIT  fails  (II)  or  the  false  alarms  occurs  (III): 

Mj  =  MsjP(l)  +  MBIT+jP(2)  +  MFAjP(3)  (5.24) 

where  P(l)  is  the  probability  that  the  subsystem  fails  before  the  oc¬ 
curence  of  the  BIT  failure  or  false  alarm,  or  that  I  happens  first, 
P(2)  is  the  probability  that  II  occurs  first  and  P(3)  is  the  probability 
that  the  false  alarm  sends  out  the  block  to  the  state  "not  OK". 

We  will  assume  that  the  evaluation  of  the  system  availability  is 
performed  in  its  design  phase.  The  availability  we  are  interested  in, 
is  then  of  the  system  in  its  mature  state —  excluding  infant  mortality 
and  wearout  period.  Furthermore  without  the  loss  of  generality  we 
will  assume  that  our  subsystems  and  BIT'S  are  complex  systems  by 
themselves  so  that  "quasi"  constant  failure  rate  might  be  applied  to 
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them.  The  above  assumptions  are  necessary  since  the  only  available 
data  in  the  design  phase  are  usually  of  the  constant  failure  rates. 
Although  such  an  estimate  is  often  not  completely  justified  in  real 
cases  it  is  quite  good  for  comparison  purpose  in  the  design  phase. 

Constant  failure  rates  yield  a  Poison  process  for  the  occurence 
of  failures.  The  probability  that  i'th  cause  occurrs  first  among  n  of 
the  possible  events  is: 

P(i)  =  £  (5.25) 


where  A.,  is  the  i'th  failure  rate  and  A  is  the  failure  rate  of  the  sum 
of  the  n  independent  processes  which  is  also  a  Poinon  process. 

P(l)  =  ^ 

P(2)  =  (5.26) 

P(3)  =  x 

where  A.  is  the  failure  rate  of  j'th  blocks,  A  .  and  A^^.,  are  failure 
J  sj  mij 

rates  of  the  subsystem  j  and  of  its  BIT  physical  failures  P(3)  =  x  is 
the  percentage  rate  of  the  false  alarms,  normalized  on  the  j'th  block 
failure  rate.  Note  that  x  is  not  a  false  alarm  rate,  but  rather  the 
false  alarm  percentage  of  all  events  which  occurs  to  the  block.  Since 
something  has  to  cause  the  block  failure: 

P(l)  +  P(2)  +  P(3)  =  1 


or 


V^BITj  t 
Ai 

t  _  Asj+ABITj 
j  1-x 


1 


(5.27) 
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and  since  failure  rates  are  assumed  to  be  constant: 


Mj  =  XT  =  1-x 

)  !  XBITj  Msj 


(5.28) 


or 


Case  1 


(5.29) 


where  \gj  and  are  easily  obtained  or  prescribed  from  the  de¬ 

sign  and  x  is  the  percentage  rate  of  false  alarms.  Equation  (5.29) 
expresses  the  mean  time  between  failures  \y  for  the  systems  with  BIT 
in  terms  of  parameters  which  are  natural  to  estimate  or  prescribed  as 
the  system  is  being  designed. 


Figure  8:  Mean  time  between  failures  as  the  function  of  design 
parameters  and  percentage  false  alarms  x. 
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For  example:  If  we  can  tolerate  5  to  10%  of  false  alarms,  and  if  BIT 
is  constructed  from  2  to  10  times  better  parts  than  the  monitored 
subsystem,  the  new  MTBF  will  be  only  60  to  86%  of  original  MTBF  of 
the  subsystem. 

V2.2  ESTIMATES  FOR  THE  MEAN  TIME  TO  REPAIR  vj 

To  estimate  mean  time  between  repairs  we  proceed  as  above,  with 
one  very  important  difference:  we  will  not  assume  anything  about  the 
repair  time  distribution,  as  we  did  with  failures. 

In  the  case  1,  the  time  to  complete  repairs  is  a  function  of  what 
has  caused  the  maintenance  operation  on  the  jth  block: 

1  -  the  subsystem  has  failed 

2  -  the  BIT  has  physically  failed 

3  -  false  alarm 

In  the  case  2,  and  3,  the  block  goes  to  repair  only  when  the  sub- 

A 

system  fails,  but  the  time  to  repair  depends  on  the  situation: 

1  -  only  the  subsystem  has  failed 

2  -  the  BIT  has  failed  before  (or  has  not  been 

repaired  yet) 

3  -  false  alarms  occurred  before 

So  in  cases  1,  2  and  3,  if  we  denote  by  Agj.=0  ,  vBITj=0,  vFA, 
mean  time  to  repair  for  the  1st,  2nd  or  3rd  cause  respectively  then 

gase  M'3  v3  =  VoP(1)  +  ABITj=0P(2)+  vFAP(3)  (5.30) 

g-ase  4  vi  =  vsj  (5.31) 

while  in  the  case  4  repair  of  BIT  is  omitted.  Note  that  P(l),  P(2), 
and  P(3)  in  cases  1  and  3  are  the  probabilities  that  the  j'th  sub- 


24 


system,  its  BIT  or  that  false  alarm  occurs  first,  before  the  other  two 
respectively : 


P(l)  = 


Si 

xi 


1-x 


1-x 


Asj+ABITj 


S] 


Bj 


(5.35) 


P(2)  =  ^BITi  = 


Ai 


^s)  XBITj  _ 

x*  V.  " 


1-x 


1+ 


vBITj 


sj 


T3ITj 


sj 


P(3)  =  x 

But  in  the  case  2,  where  BIT'S  are  repaired  separately  should 

be  substituted  only  with  the  rate  of  failures  which  are  not  repaired  at 
the  subsystem  failures.  We  will  omit  details  here,  since  all  the  other 
derivations  do  not  change. 

To  estimate  the  mean  time  to  repair  Vj  in  the  above  equation 
(5.30)  we  divide  the  repair  time  into  several  stages: 
set  up  for  tests 

failure  detection,  failure  isolation  (FD/FI) 

replacement 

verification 

the  time  to  repair  the  subsystem  or  to  repair  the  BIT  will  be  the  sum 
of  times  needed  to  accomplish  those  separate  tasks: 

v(sj)  =  vset  up(Sj)  +  vFD/FI(Sj)  +  vrepl(Sj)  +  vver(Sj)  ,, 

vd.oo; 

v(BITj)  =  vset  up(BITj)  +  vFD/n(BITj)  +  vrepl(BITj)  + 

WBITi> 

where  v(sj)  and  v(BITj)  are  mean  repair  times  to  the  j'th  subsystem 
and  the  J'th  BIT  repair  and  v$et  up(  ),  vFD/Fp  vrepl(  ),  vver(  ) 
correspond  to  the  repair  stages,  described  above. 
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The  mean  time  to  repair  of  the  j'th  block,  when  the  subsystem 
failed  but  BIT  functions  properly  -  vsj_q  is: 

vsj=0  =  V(S))  '  vFD/FI(si)  (5'34) 

since  failure  detection  and  isolation  is  provided  by  the  BIT  and 
practically  no  time  is  spent  in  comparison  with  other  tasks. 

The  time  to  repair  is  in  every  specific  ease  different  and  de¬ 
pends  on  factor  such  as: 

training  of  the  personnel 
skill  level  of  the  personnel 
available  equipment 

and  others.  In  a  typical  case  the  expected  time  to  repair  will  take 
10%  for  set  up,  50%  for  the  trouble  shooting,  30%  for  the  replacement 
and  the  remaining  10%  for  the  verification  (  ).  The  validity  of 

this  assumption  can  be  checked  in  the  existing  equipment  and  then 
transferred  to  the  new  designs. 

v  set  up  (•)  =  .lv(-) 


FD/FI^ 

=  *5v(  ♦) 

repl^ 

• 

> 

CO 

(5.35) 

ver^- 

•lv(-) 

This  assumption  of  nominal  relations  among  durations  of  portions  of 
the  repair  cycle  is  fundamental  to  the  assessment  of  the  contribution 
of  BIT  to  system  availability.  The  equation  (5.34)  might  be  now 
rewritten : 

vsj=0  =  °-5v(s0  (5.36) 

When  BIT  physically  fails,  the  complete  maintenance  of  the  corres¬ 
ponding  block  consists  of  discovering  the  condition  and  replacing  the 
BIT.  No  part  in  the  subsystem  need  to  be  replaced  in  the  case  1, 
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but  in  cases  2,  3  the  subsystem  is  always  repaired,  since  it  was  the 
cause  of  maintenance. 

Case  1  vBITj=0  =  v(^  '  vrepl^s^  +  V(BITJ)  =  °-7v(sj)  +  v(BITj) 

Case  2,3  vBITj=0  =  v(sj)  +  v(BITj)  (5.37) 

In  this  case  of  false  alarm,  nothing  really  fails  in  case  1,  which  in 
cases  2  and  3,  subsystem  should  be  again  repaired: 

vFAj  =  v(sj)  -  (vrepl(sj))  +  v(BITj)  -  vrepl(BITj)  (5.38) 

Case  1  VpAj  =  .7  v(sj)  +  .7v(BITj) 

Case  2,3  vpAj  =  v(sj)  +  .7v(BITj) 

The  above  results  are  now  plugged  into  the  final  equation: 

Case  1:  vj  =  .5v(sj)P(l)+  [ .7v(sj)+v(BITj)]P(2)+.7[v(sj)+v(BITj)]P(3) 
Case  2,3:  vj  =  .5v(sj)P(l)+[  v(sj)+v(BITj)]P(2)+[v(sj)+.7v(BITj)]P(3) 

After  some  manipulation 


B  =  <(  5+  ^BIT|  .  v(BITj)  +  7  W]  1 
B1  (('5  Xsj  >  v(sj)  '7  ^  lt  XB1T, 


^=b1  +  [.7  -BJ 


kS) 


X 


Case  1 


Case  2,3 


(5.39) 


(5.40) 


Although  the  above  equation  for  the  mean  time  to  repair  the  j'th  block 
might  seem  a  little  awkward,  it  is  really  quite  simple.  To  appreciate 
its  meaning,  we  draw  some  plots  for  case  1. 
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1.  Typical  plot;  =  .1 

*8j 


y  = 


.51  +  .ran  +  U9  +  .9in]x  y=^ij)n  =  !^0^  (5.4i) 


Figure  9:  Mean  time  to  repair  in  typical  situation. 

The  shaded  area  represents  the  area  in  mean  repair  time  due  to  in¬ 


clusion  of  the  BIT.  The  influence  of  the  false  alarms  is  clearly  seen. 


2.  Ideal  system,  with  no  false  alarms,  x=0 
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Again  the  shaded  area  is  the  "no  improvement"  area.  So  even  with¬ 
out  false  alarms  the  physical  failures  of  the  BIT  can  ruin  our  expec¬ 
tations  for  better  and  faster  repairs.  Note  that  two  times  better  BIT 
cannot  help  if  the  mean  time  to  repair  the  BIT  is  too  long. 


3.  Ideal  BIT  which  has  negligible  physical  failure  rate 


VBIT| 

A.sj 


0 


y  =  .5  +  (.2  +  .7q)x 


y  =  JLr  n  =  S!£BIT]} 
y  v(sj)  1  V(sj) 


(5.42) 


4.  BIT  is  repaired  very  fast  v(BITj)  _  q 


B  =  (.5  +  -n)Yirj 
y  =  B  +  [  .7-B]x 


(5.43) 
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To  summarize  all  these  plots,  we  plot  the  parameters  with  y  =  1 
or  Vj  =  v(sj)  where  no  improvement  in  the  repair  time  occurs,  y  <  1 
or  Vj  <  v(sj)  is  all  the  area  below  particular  curve  in  the  plot. 

Since  v(BITj)/v(sj)  and  A(BITj)/\(sj)  are  the  design  parameters, 
this  plot  tells  us  roughly  what  rate  of  x-false  alarms  we  can  afford  to 
tolerate  for  improvement  of  v.  over  v(sj).  The  plot  provides  the 
designer  with  some  estimate  of  improvement.  For  example:  Let  say 
that  at  most  10%  of  FA  can  be  tolerated;  since  A  .  and  v(sj)  are 
usually  given  and  thus  Agj  =  l./(l  +  v(sj)Xsj)  is  fixed.  To  improve 
the  availabiltiy  Aj,  BIT  should  be  added.  If  we  can  afford  only  two 
to  ten  times  better  parts  in  the  BIT  so  that  *BITjAsj  is  .5  to  .1  then 
in  order  to  improve  the  A.,  we  find  from  the  figure  17  that  MTTR  for 
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BIT  must  be  only  70%  of  MTTR  of  subsystem  in  order  to  improve 
anything  at  all.  Usually  parts  for  BIT  will  not  be  readily  available 
and  because  BIT  is  higher  quality  and  thus  fails  less,  the  technicians 
will  spend  more  time  to  repair.  If  the  time  is  bigger  than  MTTR  the 
subsystem  then  is  better  to  redesign  the  subsystem  and  omit  the  BIT. 
In  the  next  chapter  we  discuss  the  availability  change  with  the  BIT. 


V2.3  Estimates  for  the  system  availabilities 


Since  A.  =  - —  under  assumption  I  and  A  .  =  n  .we  compute 

1+  av  1+  I  !a 

M-  J=1  w 

system  availabilities  using  the  previously  derived  expressions  for  mean 
repair  and  failure  of  BIT-monitored  blocks. 

yl  _  M(sj)  v(sj)  vj  _  1  vj  v(sj) 

Mi  ~  Mj  *  M(sj)  *  v(sj)  ~  Mj/M(sj)  *.  v(sj)  '  jj(sj) 


(5.44) 
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Cases  2  and  3 

In  the  cases  2  and  3  where  false  alarms  do  not  influence  the 
failure  rate  |jj  =  |j(sj): 


Yl  = 

Ml 


[(.5 


1  Vj  v(sj)  _  _vj _  y(sj) 

H(sj)/jj(sj)  *  v(sj)  *  n(sj)  "•  v(sj)  *  |j(sj)  (5-45) 

+  XBITj  .v(BITj)  ,XBITK  1-x  t  Q+  ?  v(BITjX  ,  . 

Xsj  )v<si>  ^sj  1+  ^BIT|  (  vcsj)  n(sj) 

*sf 

S]  (5.46) 


Under  the  assumption  II  of  the  suspended  animation  the  above  can  be 
directly  inserted  into  the  expression  for  the  average  availability 


av 


n  v. 
1  +  I  -i- 

;=i  Hi 


A 


av 


n 


j=l 


where  A_r  is  the  availability  consisting  only  of  the  subsystems  without 

av 

the  monitoring  BIT.  While  under  assumption  II  of  independent  block, 
we  are  left  with  some  computations.  We  define  the  following 


BITj 


q  +  v(BITj) 
U(BITj) 


Asi  = 


1  +  Y(sj) 
1  H(sj) 


1+  ^ 


HJ 


(5.47) 

(5.47) 


where  ABITj  is  the  asymptotic  value  of  the  availability  of  the  j'th 
BIT,  Agj  is  similarly  the  asymptotic  value  of  the  availability  of  the 
j'th  subsystem  or  "previous"  availability  -  availability  from  before  the 
BIT  was  attached.  Aj  is  the  availability  of  the  new  -  BIT  equipped 
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subsystem  -  we  called  it  block  j. 


M(BITj)  "  AgITj  '  1  '  a\  "  1  andS  Aj  (5'48) 

we  insert  the  above  in  the  main  equation  and  the  result  is: 

1  x+A.  /Asj  x  1 

AUli;  =  BITi  +  1+  •  5(\BITj/Asj )~  a  .  +  7 _ X 


—  ,  1+ABITj/Asj 

V 


1+ABITj/Asj 


^BITj 


-  1 


XBITj/Asj  -  1 

Asj 


(5.49) 

Since  we  are  really  interested  in  a  percentage  of  the  availability 
inprovement  called  a,  of  the  BIT  equipped  block  versus  the  block 
without  BIT,  which  is  just  the  subsystem. 


-  -  ^ 

Asj 

let  v  be  the  expression  in  the  main  equation: 

y  =  =  v?b 


(5.50) 


(5.51) 


Asj 


-  1 


then: 


Asj 


l-(l+«)y 

(l+«)(l-y) 


(5.52) 


Similarly,  we  define  z  as  the  ratio  between  the  availability  of  the  BIT 
and  the  corresponding  subsystem  availability: 

At 


z  = 


BITj 

Asi 


_  1/2  -  U 


S] 


u 


(5.53) 


where  u  is  the  expression  in  the  main  equation: 
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u  =  ABITj 


-  1 


Asi 


ABITj 

^sj 


(5.54) 


the  above  manipulations  enable  us  to  construct  the  nomogram,  where 
AgiTjAsj  *s  t^ie  parameter.  To  summarize: 


v 


=  -X 

v(sj)  ' 


Aj 

oc  =  — l 


A. 


Asj 


JI 


(5.55) 


V  d-x)+.7fu 

^  =  ABITj/Asj 

u  .  ^L=^,2  =  ^a 

Asj  -  1  S) 

1 


Obviously  «max  is  achieved  when  Aj  availability  will  be  1,  from 

there  we  calculate  maximum  v,  while  maximum  u  is  achieved  when  the 

BIT  is  100%  available.  Usually  we  can  estimate  from  experience  At-,^. 

BllJ 

easier  than  v(BITj)  -  MTTR  for  BIT,  but  in  any  case  the  nomogram 
for  determining  ABJTj  from  Ag.,  knowing  v(BITj)/v(sj)  and  ^BITjAsj 
is  added  at  the  bottom.  The  plot  might  serve  also  for  quick  estimate 
of  required  MTTF  and  MTBF  for  BIT  to  achieve  certain  improvement. 
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The  above  nomogram  can  be  used  to  detemine  %  of  improvement 

from  the  design  characteristics.  ^grr/^sj'  rate  ^a*se  a^arms  x  an(^ 

both  availabilities  A  .,  ADTrT,;.  Also  if  we  have  in  mind  the  desired 

sj'  BIT] 

percentage  of  reliability  improvement  of  the  availability,  the  needed 
values  of  the  parameters  can  be  obtained  from  the  nomogram. 

Since  the  symptotic  value  of  the  reliability  importance  Ij  was 
defined  (4.12) 


Ij  =  h(lj[A.])  -  h(0.[A.])  = 


3h([A.]) 


(5.56) 


and  we  can  asses  easily  overall  improvement  of  the  system  availability 


n  3h(A.) 

Ah([A.])  E  2  AA. 

1  j=l  3Aj  ] 


(5.57) 


n 

Ah([A.])  E  I  I.AA. 

j=i 


where  Ah([A.  ])  =  h([A.])  -  h([AQ.])  and  AA.  =  A.  -  A  .  where  AA. 
can  be  obtained  simply  from  the  previous  nomogram.  From  equation 
5.52  we  conclude  that  only  the  most  important  blocks  with  big  Ij 
counts .  So  we  concentrate  only  on  improving  these  blocks ,  since 
others  are  not  influencials  (Ij  0). 

The  same  is  true  under  the  assumption  II  of  the  suspended 
animation.  Here  we  assumed  the  series  connection,  so  that  the  worst 
subsystems  are  also  the  most  important.  We  then  concentrate  to  these 
blocks . 

For  the  rough  estimates  of  the  marginal  improvement,  we  specify 
A-BiTj/^sj  anc*  eva^uate: 
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2  > 


1 

(l-B)Asj+B 


B  = 


_ 1 _ 

1+.5  |  +  .7  ^  (1+|  ) 


(5.58) 


A  figure  16  presents  the  obtained  results.  The  plot  is  similar  to  the 
figure  14,  only  here  the  relation  is  between  the  availability  of  the 
subsystem  Agj  and  the  normalized  BIT  availability  ABpj’j/Asj-  T^e 
nonfeasible  area  is  clearly  visible.  For  the  fixed  values  of  false 
alarm  rates  the  area  under  the  curve  will  yield  improvement.  This 
plot  might  thus  serve  as  the  quick  orientation  in  the  design. 


Figure  16:  Plot  of  marginal  improvement. 
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Since  the  percentage  of  the  reliability  improvement  «  =  AA./A  . 

J  SJ 

for  the  j'th  block  is  expressed  through  the  set  of  equations  it  is  also 
of  some  interest  what  is  the  influence  or  the  sensitivity  of  «  to  the 
parameters : 


a  = 


-  B(A.„  ^12 . 


sj'  \ 


X) 


SJ 


SJ 


(5.59) 


Now  as  before: 


A“=8^-AAsj  +  If  +?f  “  +  (5.60) 

SJ  Z 

where  z  =  AgiTj/Asj*4  =  ^BITj^sj"  Anc*  a9a*n  we  can  Pl°t  these 
values . 

We  might  be  also  interested  in  the  percentage  changes.  Since 
the  center  of  our  discussion  is  the  availability  improvement  or  degra¬ 
dation,  we  cannot  use  a- the  percentage  of  improvement  -A«/«  might 
blow  up  around  zero.  Instead  we  define: 


Ai 


jnew 

Ai 


AA- 

As 


+  1  =  l+cc 


Since  A(l+«)  =  A«,  we  determine  sensitivity  coefficients  as: 


(5.61) 


AA 

A«  _  9lnq  _ sj  9lng  A£ 

1-hx  alnAsj  Asj  9ln|  % 


9lng  Az  9lng 

3lnZ  z  9lnZ 


^  (5.62) 


where  llnf  ~  tf/4  ~  an(A  we  can  these  weights  as  before. 

Figure  17  presents  the  results.  In  the  left  column  are  the 
partial  derivatives  and  in  the  right  one  are  the  sensitivity  coeffi¬ 
cients.  Nearly  perfect  BIT’S  are  on  the  top  -  ADTrT1./\  .  =  .01  or 

BITj  SJ 
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mean  time  to  failure  of  a  BIT  is  100  times  greater  than  that  of  the 
corresponding  subsystem.  On  the  bottom  there  is  a  plot  for  BIT'S 
with  equal  quality  -  the  mean  time  to  failure  for  a  BIT  and  its  sub¬ 
system  are  equal.  The  graphs  show  that  the  contribution  to  improve¬ 
ment  results  from  initial  availability  of  subsystem  Agj,  which  is  more 
or  less  given  and  it  is  the  actual  reason  for  the  adding  of  BIT'S.  In 
the  same  category  is  also  z  =  Agjrpj/Agj  which  measures  relative 
quality  of  the  BIT.  Since  failure  rates  are  already  expressed  expli¬ 
citly  in  the  equations  5.55  and  57  the  z  also  represents  relation 
between  mean  time  to  repair  for  a  BIT  and  its  subsystem.  As  it  is 
clear,  the  better  the  BIT'S,  better  the  availability  of  the  block. 
False  alarms  still  harm  the  availability  but  not  of  the  order  of  other 
parameters.  This  is  also  true  on  the  field,  since  "good"  systems  will 
mostly  show  false  alarms  and  the  personnel  will  get  "used  to"  them, 
which  is  OK,  when  the  failures  are  not  catastrophic,  or  when  they 
can  be  detected  by  some  other  way. 

Similarly,  the  effect  on  the  block  availability  of  the  BIT  quality 
-  g  =  *BITjAsj  decreases  with  the  quality.  In  all  the  cases,  the 
derivatives  and  the  sensitivity  coefficients  increases  drastically  with 
false  alarm  rates.  So  by  reducing  false  alarms  we  can  expect  better 
performance  of  the  block.  One  remark  is  also  neccessary  here.  As 
maintenance  personnel  get  used  to  false  alarms,  they  simply  don't 
trust  the  BIT  indications  any  more.  So  false  alarms  might  very  well 
return  our  system  to  the  previous  non-BIT  state,  not  to  mention  the 
loss  of  image  of  the  producer  (besides  the  loss  of  quality). 

Figure  18,  shows  partial  derivatives  and  sensitivity  coefficients 
as  functions  of  4  =  quality  of  BIT.  There  are  no  signifi- 
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cant  differences  between  different  |  .  The  conclusion  we  can  draw 

from  the  graphs  is  that  good  BIT  quality  influences  improvement  much 
more  than  bad  ones. 
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Figure  17:  Partial  derivatives  of  %  improvement  on  left  and  sensitivity 
coefficients  of  ratio  of  improvement  A«/l+fl£  on  the  right. 
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^BITj 


xsj 


3 ;  Partial  derivatives  and  sensitivity  coefficients  as  on  the  ficrure  ±  i 

,  nut  now  as  functions  of  £  =  ^gj-pjASj- 
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Combining  the  last  results  with  the  overall  system,  for  example, 
we  are  able  to  roughly  determine  sensitivity  of  our  system  to  let's 
say  false  alarm  rate: 


n 

AA  =  I 

j=l 

n 

AA  =X 

j=l 


ri 


(5.63) 


j  alng  Ax 

xj  8lnx  x  j 


Case  1 

The  only  difference  from  Cases  2  and  3  above  is  the  bigger 

•* 

failure  rate  of  the  j'th  block  since  it  is  influenced  by  the  false  alarms. 
As  before: 


1 


v(sj) 


!i  _ 

|Jj  ”  Mj/M(sj)  *  v(sj)  *  M(sj) 


Inserting  equations  5.2  and  5.37  into  above: 


vi  = 


a.5+  ^T1)!OT+.7Wi 


1-X 


M]  (l-x)/(l+  XBITj)  Xsj  '  v(s^  *'  Ks) 

KS) 


1  +  X 


BIT] 

^sj 


•7(1  ♦ 


y(sj) 

M(sj) 


(5.64) 


vi 

Ml 


5(1  +  ^glTk  v(BITj) 


x  v(sj) 

1"X  M(sj)  (5.65) 

As  before,  the  nomogram  can  be  constructed  using  the  above  and 
(5.55) 
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y  =  4L  oc  =  ^§1 

Y  v(s])  Agj 


7=1^  [.7(x+4)  +  (1+.5  |  +  .2x(|-1.5)u] 


4  \BlTj/Asj 


U  = 


ABITj~1 

1 

Asj  "  1 


=  v(BITj)  £  t  z=  BITj 


v(sj) 


AS) 


As  before  «  is  the  percentage  of  improvement  (5.50) 


«  =  g(A 


sj'  \ 


^BITj 


,  x. 


S] 


3ITj 

Asj 


(5.66) 


And  the  plot  of  the  partial  derivatives  and  sensitivity  coefficients  is 
given  on  the  next  pages.  Conclusions  are  similar  than  before,  only 
the  situation  becomes  much  worse. 


44 


Figure  19:  Same  as  15,  but  for  case  1. 
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Figure  20:  Same  as  17,  only  for  case  1. 
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■sj 


Figure  21:  Same  as  28  only  for  case  1. 


U'rt 
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Before  we  proceed  with  an  example,  we  will  show  that  one  to  one 
assumption  of  one  subsystem  monitored  by  one  BIT  is  not  very  re¬ 
strictive  . 


VI  STRUCTURES  WHICH  CAN  BE  REDUCED 
TO  ONE  SUBSYSTEM  ONE  BIT  IN  THE  BLOCK 

In  this  section  we  show  that  the  previous  discussion  is  much 
more  general  than  it  would  seem  from  the  restrictive  assumptions. 
First  we  discuss  a  central  BIT  controller  which  communicates  with 
each  block's  BIT.  We  also  show  how  to  deal  with  systems  controlled 
by  several  sensors  and  where  a  single  BIT  monitors  more  than  one 
subsystem. 


VI  1:  A  CENTRAL  BIT  CONTROLLER  WHICH 
COMMUNICATES  WITH  EACH  BLOCK'S  BIT 

Usually  the  design  of  a  system  with  BIT  capability  consists  of 
BIT'S  coupled  closely  to  one  or  several  blocks  and  a  higher  level 
central  BIT  system  which  will  collect  information,  manipulate  and  store 
it.  For  instance,  the  airplane  crew  is  interested  in  the  status  of 
their  plane,  without  details  which  should  be  provided  to  maintenance 
personnel.  The  information  displayed  is  therefore  different  for  dif¬ 
ferent  users  of  the  same  system.  To  rule  out  unnecessary  "repairs" 
because  of  the  false  alarms  from  the  upper  level,  all  BIT'S  in  blocks 
are  always  provided  with  their  own  displays. 

We  can  treat  the  above  situation  as  before  from  the  reliability 
and  from  the  availability  viewpoints.  We  can  represent  the  central 
BIT  unit  as  the  BIT  in  d  block  where  the  subsystem  is  always  avail¬ 
able  Asc(t)  =  1,  so  that  no  repairs  are  needed. 
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*R([x.(t)]*)  =  *R([xsj(t)xBj(t)])  •  (i-xBi(t))  =  V[xsj(t)-xBj(t)]) 

(6.1) 


Ficmre  21:  centralized  BITj  network.  On  left  is  reliability 

consideration  where  grey  components  are  irrelevant,  on  the  right 
there  is  availability  viewpoint  in  which  central  unit  can  trigger 
maintenance  actions. 


where  <t>R  and  [  ] '  are  augmented  function  and  vector.  The  ration¬ 

ale  behind  this  is  that  whenever  something  is  declared  as  "not  OK"  on 
the  central  screen,  it  is  the  technician's  duty  to  verify  what  is  wrong. 
So  he  is  referred  to  the  particular  blocks. 

The  next  section  generalizes  the  discussion  to  similar  treatment 
in  the  case  where  not  every  block  is  monitored  for  its  operational 
status . 


VI. 2:  BLOCKS  NOT  EQUIPPED  BY  BIT 

The  same  subsystem,  for  some  reason,  is  not  covered  by  a  BIT, 

we  just  introduce  a  fictious  BIT  in  the  block.  We  have  just  to  keep 

R 

in  mind  that  for  such  a  block  XgV (t)  =  1  whenever  the  system  is  in 
the  operation  and  that  Xg(t)  =  0  when  the  repair  is  concerned. 


1  functioning 
0  maintenance 


(6.2) 


R 

x  B.(t)  =  { 


Hence,  the  mean  repair  time  is  v(E^)  =  0  since  no  time  is  really 
spent.  All  the  other  derivations  stay  the  same. 

In  nearly  every  system  we  will  encounter  situations  where  some 
of  the  subsystems  posess  several  BIT'S  to  monitor  them  or  when 
several  subsystems  are  monitored  by  a  single  BIT.  We  elaborate  on 
these  situations  in  the  next  two  sections. 


VI. 3:  SUBSYSTEM  EQUIPPED  WITH  SEVERAL  BIT'S 
When  complex  subsystems  are  equipped  with  BIT'S  usually  the 
BIT'S  consist  of  different  sensors  to  monitor  various  operations  of  the 
subsystem.  For  example,  the  jet  engine  can  be  equipped  with  tem¬ 
perature,  pressure,  flow  rates  sensors  which  in  addition  to  the  reg¬ 
ular  measurements,  will  also  provide  information  of  what  is  wrong 
usually  such  arrangements  are  the  majority  voting  or  k  out  of  n 
type.  This  kind  of  architecture  is  used  to  reduce  the  false  alarms 
rate. 


Figure_23.:  Typical  situation  where  the  jth  subsystem  is  monitored  by 

several  BIT'S,  were  z  out  of  3.  We  can  always  reduce  sucn 
situations  to  1-1  or  one  suosystem,  one  BITconfiguration 


Let 


* 


Bji( lXBj(t>] )  the  structure  function  of  BIT'S. 


define  the  status  of  the  composite  BIT  as: 


Then  we 
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XgjOO  =  4<Bj([XBj(t)])  (6.3) 

where  i  =  is  the  number  of  BIT'S  in  the  j'th  block.  Every¬ 

thing  else  stays  the  same. 

A  similar  situation  appears  when  several  subsystems  are  moni¬ 
tored  by  one  BIT. 

VI. 4:  SEVERAL  SUBSYSTEMS  MONITORED  BY  THE  SAME  BIT 

Good  design  practice  will  always  try  to  avoid  the  use  of  a  single 
BIT  to  monitor  several  subsystems,  since  when  something  is  wrong, 
we  have  to  find  which  one  of  the  subsystems  caused  the  problem. 
Usually  such  a  case  occures  when  several  indentical  subsystems  are 
connected  in  parallel. 

To  treat  the  situation  we  build  a  jth  block  aroung  each  BIT  and 
not  around  each  subsystem. 

As  before  t}»cj([X  ..(t)])  be  the  structure  function  in  the  j'th 

t>J  bji 

block  we  define: 

xsj(t>  =  V[Xsji(t)])  1  =  1'2 . m  (6.4) 

where  m  is  the  number  of  the  subsystems  in  the  block. 


Figure  24:  Typical  situation  when  several  subsystems  are  monitored 

b'y  one  BIT.  Here  parallel  structure  of  subsystems  is  assumed, 
which  the  most  usual  situation.  We  always  reduce  the  situation  to 
1-1  block  arrangement. 
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Everything  stays  the  same  only  the  repair  time  must  include  terms  due 
to  the  possible  misclassifications . 

Of  course  there  might  be  other  arrangements  between  BIT'S  and 
the  subsystems,  but  the  question  arises  about  their  effectiveness.  If 
the  BIT  cannot  be  treated  as  one  to  one  to  the  subsystems  in  the 
block,  then  the  derived  simplified  analysis  will  not  suffice. 

To  show  the  usage  of  one  treatment  developed  here,  we  present 
the  following  example. 


VII.  EXAMPLE 

To  illustrate  the  above  model  we  discuss  a  highly  simplified 
radar . 

RF-TEST  TARGET 

I 

LO 


V 


ANTENNA 


i - 1 

TRANSMITTER- 

:  tt1 


EXCITER 


TE 


RF 


A£ 


-RECEIVER- 


"SYSTEM  BUS 


IF 


LO 


!  SIGNAL 
PROCESS 


i- 


Figure  25:  Functional  sketch  of  the  radar. 
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For  proper  operation,  all  the  above  subsystems  must  function  properly, 
so  the  system  has  the  series  reliability  structure:  Trw,, 

ANTENNA  .  DISPLAY  PKuS2sSOR  BUS 


EXCITER 


TRANSMITTER 


RECEIVER 


RADAR 

DATA 

PROCESSOR 


Figure  26:  Reliability  structure  of  the  radar. 

Let  the  (hypothetical)  values  of  MTTF  and  MTTR  of  the  sub¬ 


systems  be  estimated  as: 
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availability 


SUBSYSTEM 

MTTF 

MTTR 

\Sj  v(Sj) 

- * 

IMPORTANCE 

EXCITER 

20 

5 

.25 

r  — * 

1 

.800 

.444 
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.5 
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I 

.995 

.357 
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50 

2 
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1 

i 

962 

.369  ■ 

DISPLAY 

80 

2 

.025 

976 

.364 

RECEIVER 

11 

5 

.465 1 

.687 

.517 

SIGNAL  PROCESSOR 

30 

4 

.133 

.882 

.402 

DATA  PROCESSOR 

10 

2 

.20 

.833 

.426 

BUS 

80 

5 

.063 

.941 

.377 

Figure  27:  Data. 

3h([A.]) 

For  a  system  with  a  serial  reliability  structure  I.  =  — 5-? -  = 

J  3Aj 

anAi  A 

aa —  =  n  A.  =  ,  so  that  the  most  important  or  influential  are 

s  i=j  1  Aj 

the  worst  components,  which  makes  sense:  the  biggest  increase  we  can 
expect  when  we  change  the  most  critical-the  worst  component 
n 

AA  =  I  I. A. 
i=l  ]  J 

To  incorporate  the  BIT  in  the  Bus  will  not  cause  any  additional 
effort.  Some  fixed  pattern  of  0's  and  l's  is  sent  through  and 
checked.  Similarly  in  the  exciter  we  can  provide  a  signal  with  known 
characteristics  and  its  response  can  be  observed  on  the  screen.  On 
the  other  hand,  the  central  display  does  not  need  any  BIT,  since  if 
it  does  not  work  that  will  be  self-evident. 
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Figure  28:  Organization  in  six  blocks. 

As  soon  as  the  decision  is  made  about  equipping  blocks  with 
BIT,  we  can  draw  a  reiability  diagram  including  BIT'S.  Note  also 
that  we  can  always  withdraw  BIT  from  consideration  by  using  notation 
developed  on  page  36  in  section  VI. 2. 

We  will  consider  four  cases  as  described  on  the  beginning  (page 
1)  but  we  will  reverse  the  order  for  convenience: 


CASE  4: 


CASE  3: 


CASE  2: 


CASE  1: 


BIT  indications  ignored  and  no  repairs  are  made 
on  BIT.  For  example:  the  radar  serves  for  de¬ 
tection  of  car  speed. 

BIT  "not  OK"  indications  must  wait  for  the  radar 
maintenance.  For  example:  the  radar  on  a  patrol 
boat. 

BIT  checked  up  immediately  and  connected.  For 
example:  the  radar  on  the  carrier,  technicians 

always  available. 

BIT  indication  "not  OK”  sends  whole  system  out 

of  operational  readiness.  For  example:  the  radar 
in  the  guidance  system  of  the  missile;  no  officer 
wants  to  explode. 
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SPEED  CONTROL  RADAR: 

The  case  4  is  really  not  of  interest;  since  there  are  no  BIT 
repairs,  nothing  is  really  different  from  "classical"  -  no  BIT  systems. 
CASES  2  AND  3: 

The  difference  between  these  two  cases,  is  in  the  BIT  failure 
rate.  While  in  case  3,  it  is  just  the  one  which  was  estimated,  in  case 
2  we  have  to  count  only  the  rate  of  BIT  failures,  which  are  not 
repeated  by  the  time  the  system  fails.  We  omit  details  here. 

To  get  improvement  in  the  availability,  the  MTTR's  have  to  be 
reduced,  since  here  the  failure  rate  of  the  subsystem  is  not  influence 
by  the  BIT'S  false  alarms. 


\ 
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The  shaded  region  in  Figure  29  indicates  the  parameter  correla¬ 
tions  for  this  example  for  which  BIT'S  give  no  improvement  when  the 
false  alarm  fraction  is  10%.  Since  our  BIT  will  be  in  general  2  to  10 
times  better  than  the  original  subsystem:  ^BITj^sj  =  1:0  *-*■'  our 

BIT  should  have  a  MTTR  less  than  .9  to  .7  to  get  some  improvement. 
Obviously  this  is  not  very  easy.  Spare  parts  and  skill  level  of  the 
personnel  might  every  well  prevent  faster  repairs.  If  this  is  the 
case,  investing  money  in  better  subsystems  might  be  more  sensible. 

We  start  the  analysis  with  the  worst  subsystem.  For  quick  ori¬ 
entation,  we  will  use  the  nomogram  in  Figure  15.  As  soon  as  we  get 
the  percentage  of  improvement,  we  can  evaluate  its  effect  on  the 
whole  system  by  using  equations  (5.52). 

AA  =  LAAj 

which  is  again  just  a  quick  orientation.  Obviously  we  can  also  pro¬ 
ceed  the  other  way  around.  If  we  need  a  certain  system  availability, 
we  can  use  Ij  to  approximately  allocate  the  improvements.  There  are 
two  things  we  have  to  keep  in  mind:  first,  this  is  just  an  approxi¬ 
mation  in  terms  of  derivatives  and  so  finally  we  have  to  repeat  the 
detailed  calculations.  Second,  estimated  failure  rates  for  mature 
systems  are  optimistic. 
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Figure  31:  Data  -  case  2,  3. 


In  the  figure  30  we  sketched  how  we  got  the  required  numbers 
for  the  desired  reliability.  Note  it  is  not  always  possible,  even  with 
100%  BIT  availability  to  get  desired  blocks  availability.  The  reason  is 
in  our  assumption  that  approximately  50%  of  repair  time  will  go  to  the 
failure  detection  and  isolation  in  the  long  run.  Also  the  values  are 
only  estimates  because  of  paper  imprecision  and  since  false  alarms 
were  just  roughly  taken  as  10%. 

When  the  actual  BIT  data  are  obtained  (as  before  we  introduce 
arbitrary  data  just  for  the  example),  we  can  look  back  for  the  %  of 
the  block  availability  improvement.  We  see  much  less  improvement 
than  desired. 

The  last  column  in  Fig.  31  represents  actual  availabilities  of  the 
block,  but  x  is  left  as  the  variable.  Obviously  every  block  will 
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exhibit  a  different  percentage  false  alarm  rate,  but  for  the  illustration 
on  the  influence  of  false  alarms  on  the  whole  system  availability,  we 
evaluate : 

2  A.  _  1 

A  ~  j=l  A]  ~  •  934(1 . 116+ .  17x)  (1 . 22+ .  3x)  (1 . 12+ .  034x)  (1 . 5+ .  35x) ( .  078+ .  03x) 

A  =  _JL_  =  w _ I _ 

Aav  n  2 . 141+ . 883x 

1+Ui 
1 

Figure  52  represents  the  above  equations.  The  average  availa¬ 
bility  in  the  assumption  II  of  suspended  animation  is  always  greater 
than  the  availability  under  the  independence  assumption.  Since 
=  vj/M  are  positive: 


n  .  1 
A  =  "  ^  =  “n — “ 
]  n  (l+L) 
i=l  j 


- ± - , -  =  - i -  <  - ± -  =  A 

n  tj  n  2  n  av 

I  (n)*j  1+lL+n  4.  +..  1 +  2L 
j=0  j  j  J  ]  j=l3 


Figure  32:  System  Availability  as  a  function  of  common  FA-rate. 
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Figure  32  shows  very  pessimistic  results,  as  a  consequence  of  the 
data  used  in  the  examples.  We  see  first  that  the  availability  with  BIT 
might  be  worse  as  we  discussed  before.  Second  and  more  important 
is  the  result  that,  the  personnel  will  not  bother  with  large  false 
alarm  rates,  since  the  false  alarms  just  confuse  them,  and  by  ignoring 
the  BIT'S  and  repairing  only  the  subsystems  they  will  actually  in¬ 
crease  the  availability  of  the  system.  Note  also  that  this  example  was 
picked  this  way  and  other  cases  might  be  more  optimistic. 

Case  1 

Here  the  false  alarms  influence  also  the  failure  rates  and  we  re¬ 
peat  the  above  analysis  step  by  step: 

A  =  Aj  =  - - - 

. 934 ( 1 . 19+ . 37x* ) ( 1 . 27+ . 59x* ) ( 1 . 14+ . 23x* ) ( 1 . 27+ . 42x* ) ( 1 . 14+18x* ) 

Aav  3+lf7  2.08+1 .79x* 


Figure  34:  System  availability,  as  32,  only  case  1 
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VIII  CONCLUSIONS 


In  the  development  of  the  model  the  following  assumptions  were 
made  to  simplify  the  derivation  and  to  enable  use  of  the  available  data 
bases : 

System  configurations  can  be  reduced  to  one  BIT-  one 
subsystem  blocks. 

♦  A  failed  BIT  has  the  same  effect  on  the  block  status 

as  the  wrong  BIT  indications,  when  the  system  is 

maintained . 

♦  Both  the  subsystem  and  BIT  are  complex  systems 

themselves,  so  that  approximately  constant  failure 
rates  might  be  used. 

There  was  no  assumption  on  the  distribution  of  repair  times. 
Although  the  division  of  MTTR  among  tasks  was  used,  the  same 

method  can  be  used  with  different  particular  applications. 

The  resulting  model  is: 

♦  simple 

easy  to  use  and  easy  to  understand 
computable:  Data  for  evaluation  are  standard,  so 

all  existing  data  bases  can  be  used, 

while  the  false  alarm  rate  is  treated  as 
a  variable,  enabling  all  the  conclusions 
from  other  studies  to  be  used, 
consistent:  It  predicts  the  situations  encountered 

in  practice,  where  ignoring  BIT  might 
speed  up  repairs  and  so  increase  avail¬ 
ability  . 
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The  examples  presented  show  that  built  in  tests  should  be  intro¬ 
duced  with  care  and  far  from  everywhere,  and  all  the  time.  The 
system  availability  is  influenced  most  by  the  subsystem  availability 
without  BIT,  so  if  the  false  alarms  cannot  be  kept  at  minimum,  or  if 
BIT'S  are  not  much  better  then  subsystems,  it  will  be  much  more 
effective  to  invest  in  basic  subsystems  availability  rather  than  to  use 
BIT'S.  Also,  it  will  be  the  most  productive,  to  install  BIT  where 
digital  circuitry  is  already  available  so  that  costs  will  be  minimal. 
Since  false  alarms  seriously  degrade  system  availability,  the  failure 
isolation  property  is  more  useful  than  the  failure  detection.  The 
BIT'S  are  coming. 


